深入学习已被利用气候数据的统计侦查。具体地,已经成功地应用于降水估计的二维(2D)卷积神经网络(CNN)。该研究实现了一种三维(3D)CNN,以估计来自3D大气数据的流域规模的每日降水,并将结果与2D CNN的结果进行比较。沿时间方向(3D-CNN-TIME)和垂直方向(3D-CNN-VERT)延伸2D CNN。将这些扩展CNN的降水估计与第2D CNN的降水估计与根均方误差(RMSE),NASH-SUTCLIFFE效率(NSE)和第99百分位RMSE相比。发现3D-CNN-TIME和3D-CNN-VERT与2D CNN相比提高了降水估计的模型精度。3D-CNN-VERT在RMSE和NSE方面提供了培训和测试期间的最佳估计。
translated by 谷歌翻译
本研究调查了深度学习方法可以在输入和输出数据之间识别的关系。作为一个案例研究,选择了通过长期和短期内存(LSTM)网络在雪撬流域中的降雨 - 径流建模。每日沉淀和平均空气温度用作估计日常流量放电的模型输入。在模型培训和验证之后,使用假设输入进行了两个实验模拟,而不是观察到的气象数据,以澄清训练模型对输入的响应。第一个数值实验表明,即使没有输入沉淀,训练有素的模型产生流量放电,特别是冬季低流量和高流量在融雪期间。在没有沉淀的情况下,还通过训练模型复制了暖和较冷的条件对流动放电的影响。此外,该模型仅反映了在总年流量放电的积雪期间的总降水量的17-39%,揭示了强烈缺乏水量保护。本研究的结果表明,深度学习方法可能无法正确学习输入和目标变量之间的显式物理关系,尽管它们仍然能够保持强大的拟合效果。
translated by 谷歌翻译
本研究提出了两种直接但有效的方法,以减少通过使用多时间级时间序列数据作为输入通过经常性神经网络(RNN)来计算时间序列建模所需的计算时间。一种方法并行地提供输入时间序列的粗略和精细时间分辨率至RNN。在将它们视为RNN的输入之前,另一个将输入时间序列数据的粗略和精细时间分辨率连接在一起。在这两种方法中,首先,利用更精细的时间分辨率数据来学习目标数据的精细时间尺度行为。接下来,预期较粗糙的时间分辨率数据将捕获输入和目标变量之间的长时间依赖性。通过采用长期和短期记忆(LSTM)网络,在雪撬流域实施时,为每小时降雨 - 径流建模实施,这是一种新型的RNN。随后,使用每日和每小时的气象数据作为输入,并将每小时流量放电视为目标数据。结果证实,两种拟议方法都可以显着降低RNN培训的计算时间(高达32.4次)。此外,提出的方法之一提高了估计准确性。
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
Recent studies have shown that using an external Language Model (LM) benefits the end-to-end Automatic Speech Recognition (ASR). However, predicting tokens that appear less frequently in the training set is still quite challenging. The long-tail prediction problems have been widely studied in many applications, but only been addressed by a few studies for ASR and LMs. In this paper, we propose a new memory augmented lookup dictionary based Transformer architecture for LM. The newly introduced lookup dictionary incorporates rich contextual information in training set, which is vital to correctly predict long-tail tokens. With intensive experiments on Chinese and English data sets, our proposed method is proved to outperform the baseline Transformer LM by a great margin on both word/character error rate and tail tokens error rate. This is achieved without impact on the decoding efficiency. Overall, we demonstrate the effectiveness of our proposed method in boosting the ASR decoding performance, especially for long-tail tokens.
translated by 谷歌翻译
Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but the quality bar for medical and clinical applications is high. Today, attempts to assess models' clinical knowledge typically rely on automated evaluations on limited benchmarks. There is no standard to evaluate model predictions and reasoning across a breadth of tasks. To address this, we present MultiMedQA, a benchmark combining six existing open question answering datasets spanning professional medical exams, research, and consumer queries; and HealthSearchQA, a new free-response dataset of medical questions searched online. We propose a framework for human evaluation of model answers along multiple axes including factuality, precision, possible harm, and bias. In addition, we evaluate PaLM (a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM, on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA, MMLU clinical topics), including 67.6% accuracy on MedQA (US Medical License Exam questions), surpassing prior state-of-the-art by over 17%. However, human evaluation reveals key gaps in Flan-PaLM responses. To resolve this we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal important limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLM models for clinical applications.
translated by 谷歌翻译
Summary quality assessment metrics have two categories: reference-based and reference-free. Reference-based metrics are theoretically more accurate but are limited by the availability and quality of the human-written references, which are both difficulty to ensure. This inspires the development of reference-free metrics, which are independent from human-written references, in the past few years. However, existing reference-free metrics cannot be both zero-shot and accurate. In this paper, we propose a zero-shot but accurate reference-free approach in a sneaky way: feeding documents, based upon which summaries generated, as references into reference-based metrics. Experimental results show that this zero-shot approach can give us the best-performing reference-free metrics on nearly all aspects on several recently-released datasets, even beating reference-free metrics specifically trained for this task sometimes. We further investigate what reference-based metrics can benefit from such repurposing and whether our additional tweaks help.
translated by 谷歌翻译
Ultra-fine entity typing (UFET) predicts extremely free-formed types (e.g., president, politician) of a given entity mention (e.g., Joe Biden) in context. State-of-the-art (SOTA) methods use the cross-encoder (CE) based architecture. CE concatenates the mention (and its context) with each type and feeds the pairs into a pretrained language model (PLM) to score their relevance. It brings deeper interaction between mention and types to reach better performance but has to perform N (type set size) forward passes to infer types of a single mention. CE is therefore very slow in inference when the type set is large (e.g., N = 10k for UFET). To this end, we propose to perform entity typing in a recall-expand-filter manner. The recall and expand stages prune the large type set and generate K (K is typically less than 256) most relevant type candidates for each mention. At the filter stage, we use a novel model called MCCE to concurrently encode and score these K candidates in only one forward pass to obtain the final type prediction. We investigate different variants of MCCE and extensive experiments show that MCCE under our paradigm reaches SOTA performance on ultra-fine entity typing and is thousands of times faster than the cross-encoder. We also found MCCE is very effective in fine-grained (130 types) and coarse-grained (9 types) entity typing. Our code is available at \url{https://github.com/modelscope/AdaSeq/tree/master/examples/MCCE}.
translated by 谷歌翻译
Prior works on Information Extraction (IE) typically predict different tasks and instances (e.g., event triggers, entities, roles, relations) independently, while neglecting their interactions and leading to model inefficiency. In this work, we introduce a joint IE framework, HighIE, that learns and predicts multiple IE tasks by integrating high-order cross-task and cross-instance dependencies. Specifically, we design two categories of high-order factors: homogeneous factors and heterogeneous factors. Then, these factors are utilized to jointly predict labels of all instances. To address the intractability problem of exact high-order inference, we incorporate a high-order neural decoder that is unfolded from a mean-field variational inference method. The experimental results show that our approach achieves consistent improvements on three IE tasks compared with our baseline and prior work.
translated by 谷歌翻译
Dense retrievers have made significant strides in obtaining state-of-the-art results on text retrieval and open-domain question answering (ODQA). Yet most of these achievements were made possible with the help of large annotated datasets, unsupervised learning for dense retrieval models remains an open problem. In this work, we explore two categories of methods for creating pseudo query-document pairs, named query extraction (QExt) and transferred query generation (TQGen), to augment the retriever training in an annotation-free and scalable manner. Specifically, QExt extracts pseudo queries by document structures or selecting salient random spans, and TQGen utilizes generation models trained for other NLP tasks (e.g., summarization) to produce pseudo queries. Extensive experiments show that dense retrievers trained with individual augmentation methods can perform comparably well with multiple strong baselines, and combining them leads to further improvements, achieving state-of-the-art performance of unsupervised dense retrieval on both BEIR and ODQA datasets.
translated by 谷歌翻译